Skip to content

String dtype: fix alignment sorting in case of python storage #59448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Follow-up on the implementation of the object-dtype version of the NaN string dtype in #58451. From expanding test coverage for this dtype in #59437, there were a lot of alignment failures:

In [1]: pd.options.future.infer_string = True

In [2]: a = Series([1, 1, 1, np.nan], index=["a", "b", "c", "d"])
   ...: b = Series([2, np.nan, 1, np.nan], index=["a", "b", "d", "e"])

In [3]: b.align(a)
Out[3]: 
(a    2.0
 b    NaN
 d    1.0
 e    NaN
 c    NaN
 dtype: float64,
 a    1.0
 b    1.0
 d    NaN
 e    NaN
 c    1.0
 dtype: float64)

The above result is not sorted, while normally alignment does sort by default.

The underlying issue is that idx1.union(idx2, sort=True) is not sorting for string dtype (will open a separate issue for that), but the reason we were getting to that code path is because we didn't correctly indicate we can use the faster libjoin for this dtype.

xref #54792

@jorisvandenbossche jorisvandenbossche added the Strings String extension data type and string data label Aug 8, 2024
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review August 8, 2024 11:48
@mroeschke mroeschke added this to the 3.0 milestone Aug 8, 2024
@mroeschke mroeschke merged commit 3182e9b into pandas-dev:main Aug 8, 2024
45 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the string-dtype-align-python branch August 8, 2024 15:31
WillAyd pushed a commit that referenced this pull request Aug 13, 2024
* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 14, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
@jorisvandenbossche jorisvandenbossche modified the milestones: 3.0, 2.3 Aug 20, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 21, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 27, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Sep 20, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 3, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 7, 2024
…-dev#59448)

* String dtype: fix alignment sorting in case of python storage

* add test
jorisvandenbossche added a commit that referenced this pull request Oct 9, 2024
* String dtype: fix alignment sorting in case of python storage

* add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants